AITopics | action segmentation

Collaborating Authors

action segmentation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ActFusion: a Unified Diffusion Model for Action Segmentation and Anticipation

Neural Information Processing SystemsMar-21-2026, 21:51:59 GMT

Temporal action segmentation and long-term action anticipation are two popular vision tasks for the temporal analysis of actions in videos. Despite apparent relevance and potential complementarity, these two problems have been investigated as separate and distinct tasks. In this work, we tackle these two problems, action segmentation, and action anticipation, jointly using a unified diffusion model dubbed ActFusion. The key idea to unification is to train the model to effectively handle both visible and invisible parts of the sequence in an integrated manner;the visible part is for temporal segmentation, and the invisible part is for future anticipation. To this end, we introduce a new anticipative masking strategy during training in which a late part of the video frames is masked as invisible, and learnable tokens replace these frames to learn to predict the invisible future.Experimental results demonstrate the bi-directional benefits between action segmentation and anticipation.ActFusion achieves the state-of-the-art performance across the standard benchmarks of 50 Salads, Breakfast, and GTEA, outperforming task-specific models in both of the two tasks with a single unified model through joint learning.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.60)
Information Technology > Artificial Intelligence > Machine Learning (0.43)

Add feedback

ee6c4b99b4c0d3d60efd22c1ecdd9891-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 20:22:24 GMT

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country:

Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
Asia > South Korea > Gyeongsangbuk-do > Pohang (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
(2 more...)

Genre:

Workflow (0.51)
Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

d40e6e4b3ee6c24f2bf2cb72c2412f4b-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-17-2026, 07:36:47 GMT

annotation, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

ActFusion: a Unified Diffusion Model for Action Segmentation and Anticipation

Neural Information Processing SystemsFeb-17-2026, 03:58:12 GMT

Temporal action segmentation and long-term action anticipation are two popular vision tasks for the temporal analysis of actions in videos.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
Asia > South Korea > Gyeongsangbuk-do > Pohang (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Education (0.67)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

OnlineT AS: An Online Baseline for Temporal Action Segmentation

Neural Information Processing SystemsFeb-15-2026, 15:57:02 GMT

Temporal context plays a significant role in temporal action segmentation. In an offline setting, the context is typically captured by the segmentation network after observing the entire sequence. However, capturing and using such context information in an online setting remains an under-explored problem.

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Media (0.46)
Leisure & Entertainment (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

42770daf4a3384b712ea9c36e9279998-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 23:21:16 GMT

machine learning, natural language, prediction, (20 more...)

Neural Information Processing Systems

Country: North America > United States > New York > Suffolk County > Stony Brook (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.69)
(2 more...)

Add feedback

Towards Open-World Human Action Segmentation Using Graph Convolutional Networks

Xing, Hao, Boey, Kai Zhe, Cheng, Gordon

arXiv.org Artificial IntelligenceDec-12-2025

Human-object interaction segmentation is a fundamental task of daily activity understanding, which plays a crucial role in applications such as assistive robotics, healthcare, and autonomous systems. Most existing learning-based methods excel in closed-world action segmentation, they struggle to generalize to open-world scenarios where novel actions emerge. Collecting exhaustive action categories for training is impractical due to the dynamic diversity of human activities, necessitating models that detect and segment out-of-distribution actions without manual annotation. To address this issue, we formally define the open-world action segmentation problem and propose a structured framework for detecting and segmenting unseen actions. Our framework introduces three key innovations: 1) an Enhanced Pyramid Graph Convolutional Network (EPGCN) with a novel decoder module for robust spatiotemporal feature upsampling. 2) Mixup-based training to synthesize out-of-distribution data, eliminating reliance on manual annotations. 3) A novel Temporal Clustering loss that groups in-distribution actions while distancing out-of-distribution samples. We evaluate our framework on two challenging human-object interaction recognition datasets: Bimanual Actions and 2 Hands and Object (H2O) datasets. Experimental results demonstrate significant improvements over state-of-the-art action segmentation models across multiple open-set evaluation metrics, achieving 16.9% and 34.6% relative gains in open-set segmentation (F1@50) and out-of-distribution detection performances (AUROC), respectively. Additionally, we conduct an in-depth ablation study to assess the impact of each proposed component, identifying the optimal framework configuration for open-world action segmentation.

artificial intelligence, machine learning, recognition, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/IROS60139.2025.11247257

2507.00756

Country: Europe > Germany (0.28)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multi-Modal Graph Convolutional Network with Sinusoidal Encoding for Robust Human Action Segmentation

Xing, Hao, Boey, Kai Zhe, Wu, Yuankai, Burschka, Darius, Cheng, Gordon

arXiv.org Artificial IntelligenceDec-12-2025

Abstract-- Accurate temporal segmentation of human actions is critical for intelligent robots in collaborative settin gs, where a precise understanding of sub-activity labels and their tem poral structure is essential. However, the inherent noise in both human pose estimation and object detection often leads to over-segmentation errors, disrupting the coherence of act ion sequences. T o address this, we propose a Multi-Modal Graph Convolutional Network (MMGCN) that integrates low-frame-rate (e.g., 1 fps) visual data with high-frame-rate (e.g., 3 0 fps) motion data (skeleton and object detections) to mitiga te fragmentation. Our framework introduces three key contributions. First, a sinusoidal encoding strategy that maps 3D skeleton coordinates into a continuous sin-cos space to enh ance spatial representation robustness. Second, a temporal gra ph fusion module that aligns multi-modal inputs with differin g resolutions via hierarchical feature aggregation, Third, inspired by the smooth transitions inherent to human actions, we desi gn SmoothLabelMix, a data augmentation technique that mixes i n-put sequences and labels to generate synthetic training exa mples with gradual action transitions, enhancing temporal consi stency in predictions and reducing over-segmentation artifacts. Extensive experiments on the Bimanual Actions Dataset, a public benchmark for human-object interaction understand ing, demonstrate that our approach outperforms state-of-the-a rt methods, especially in action segmentation accuracy, achi eving F1@10: 94.5% and F1@25: 92.8%. I. INTRODUCTION Human action segmentation, the task of temporally decomposing continuous activities into coherent sub-action uni ts, is a cornerstone of intelligent robotic systems operating in collaborative environments.

action recognition, artificial intelligence, spatial reasoning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/IROS60139.2025.11245867

2507.00752

Country: Europe > Germany (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.66)

Add feedback

Exploring Ordinal Bias in Action Recognition for Instructional Videos

Kim, Joochan, Jung, Minjoon, Zhang, Byoung-Tak

arXiv.org Artificial IntelligenceDec-8-2025

Action recognition models have achieved promising results in understanding instructional videos. However, they often rely on dominant, dataset-specific action sequences rather than true video comprehension, a problem that we define as ordinal bias. To address this issue, we propose two effective video manipulation methods: Action Masking, which masks frames of frequently co-occurring actions, and Sequence Shuffling, which randomizes the order of action segments. Through comprehensive experiments, we demonstrate that current models exhibit significant performance drops when confronted with nonstandard action sequences, underscoring their vulnerability to ordinal bias. Our findings emphasize the importance of rethinking evaluation strategies and developing models capable of generalizing beyond fixed action patterns in diverse instructional videos. Due to the dominant action pair'Take-Background', the model fails to predict the action'Open.' Action recognition in instructional videos has witnessed remarkable progress, primarily driven by models that excel in curated benchmark datasets (Farha & Gall, 2019; Ishikawa et al., 2021; Li et al., 2020; Yi et al., 2021).

artificial intelligence, dataset, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2504.0658

Genre: